Optimality Criteria in Reinforcement Learning
نویسنده
چکیده
Sridhar Mahadevan Department of Computer Science and Engineering University of South Florida Tampa, Florida 33620 [email protected] Abstract Embedded autonomous agents, such as robots or softbots, are faced with solving sequential decision problems. Reinforcement learning (RL) is a particular approach to sequential decision problems, in which the agent-environment interaction is modeled as a Markov decision process (MDP). RL agents solve sequential decision problems by learning optimal policies for choosing actions. Thus, at the core of RL is the definition of what it means for a policy to be "optimal". In this paper, we discuss a variety of optimality metrics from the dynamic programming literature, and examine their suitability for RL. We also discuss the challenges in devising RL algorithms for the various metrics.Sequential Decision Tasks In recent years, a unifying viewpoint based on embedded autonomous agents has shaped much work in arti cial intelligence (Kaelbling 1993; Rusell & Norvig 1994). Examples of such agents range from robots operating in unstructured environments, softbots navigating the Internet, or even industrial controllers operating some complex machinery. Such agents are faced with a di cult sequential decision problem, as illustrated in Figure 1. At every time instant, the agent observes the world to be in some state (perhaps, imperfectly), and must choose an action that (stochastically) alters the state of the environment. The goal of the agent is to behave optimally, that is, it needs to choose actions using a policy (either learned or programmed by its designer). The environment typically supplies the agent with some feedback regarding its performance, such as a reward or cost function. In this paper, we will be primarily interested in in nitehorizon decision tasks, where the agent continues to operate forever. The special case of tasks that have Paper to be presented at the AAAI Fall Symposium on Learning Complex Behaviors in Adaptive Intelligent Systems, Nov. 9 11th, MIT, Boston, 1996. Autonomous Agent WORLD
منابع مشابه
Low-Area/Low-Power CMOS Op-Amps Design Based on Total Optimality Index Using Reinforcement Learning Approach
This paper presents the application of reinforcement learning in automatic analog IC design. In this work, the Multi-Objective approach by Learning Automata is evaluated for accommodating required functionalities and performance specifications considering optimal minimizing of MOSFETs area and power consumption for two famous CMOS op-amps. The results show the ability of the proposed method to ...
متن کاملLocal Optimization for Simulation of Natural Motion
Reinforcement Learning (Sutton and Barto 1998) is a theoretical framework for optimizing the behavior of artificial agents. The notion that behavior in the natural world is in some sense optimal is explored by areas such as biomechanics and physical anthropology. These fields propose a variety of candidate optimality criteria (?, e.g.,)]eng,cost as possible formulations of the principles underl...
متن کاملSensitive Discount Optimality: Unifying Discounted and Average Reward Reinforcement Learning
Research in reinforcement learning (RL) has thus far concentrated on two optimality criteria: the discounted framework, which has been very well-studied, and the average-reward framework, in which interest is rapidly increasing. In this paper, we present a framework called sensitive discount optimality which ooers an elegant way of linking these two paradigms. Although sensitive discount optima...
متن کاملAnytime Self-play Learning to Satisfy Functional Optimality Criteria
We present an anytime multiagent learning approach to satisfy any given optimality criterion in repeated game self-play. Our approach is opposed to classical learning approaches for repeated games: namely, learning of equilibrium, Pareto-efficient learning, and their variants. The comparison is given from a practical (or engineering) standpoint, i.e., from a point of view of a multiagent system...
متن کاملLearning to Play in a Day: Faster Deep Reinforcement Learning by Optimality Tightening
We propose a novel training algorithm for reinforcement learning which combines the strength of deep Q-learning with a constrained optimization approach to tighten optimality and encourage faster reward propagation. Our novel technique makes deep reinforcement learning more practical by drastically reducing the training time. We evaluate the performance of our approach on the 49 games of the ch...
متن کاملMulti-criteria Reinforcement Learning
"Fe consider multi-criteria sequential decision making problems ,,,,here the vector-valued evaluations arc cOluparcd by it given, fixed total order ing. Conditions for the optimality of stationary policies and the Bell lUan optimality eqnation arc given for a special, hut importrmt cla...,s of problems when the evaluation of policies can be computed for the cri teria independently of each ot...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1996